Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Oct 13, 2024

We make concurrent iteration over itertools.pairwise safe by:

  • Using _PyObject_IsUniquelyReferenced for re-use of the result type
  • Holding a reference to po->it instead of a borrowed reference
  • Obtaining a reference to po->old early.

Also see #123848

To make pairwise_next safe we need to prevent concurrent iterators from accessing the cleared po->it

  • Idea 1: In the first line of pairwise_next replace PyObject *it = po->it with PyObject *it = Py_XInRef(po->it);.
    Then every thread will have a true reference to the iterator, and not a borrowed reference.
    Simple, but overhead is an incref/decref in the common case
  • Idea 2a: Do not clear po->it. We then need another way to signal that the iterator has been exhausted.
    Use another variable "int iterator_exhausted" in the pairwise object. Requires a bit more memory
    for each pairwise object, but the check is a simple int comparison.
  • Idea 2b: Do not clear po->it. We then need another way to signal that the iterator has been exhausted.
    Use po->old to set a sentinel. Requires a global sentinel in the module (or a sentinel per thread).
    In the top of pairwise_next we would have
PyObject *old = Py_XNewRef(po->old);
if (old == sentinel) {
      // iterator was exhausted earlier, return NULL
      Py_DECREF(old)
      return NULL;
}
if (old == NULL) {
      // first call to pairwise_next, normal code
      ...
}

An issue with this approach is that at the end of pairwise_next there is the line: Py_XSETREF(po->old, new);
So one thread can set the sentinel, but in another thread the value could be overwritten.
Maybe this is ok: this will not cause crashes, and we do not have to guarantee correct behaviour for
the free-threading build.
This approach does not require additional increfs/decrefs, but it does require setting up the sentinel.

In this PR we work out the first option.

@rhettinger
Copy link
Contributor

Can I suggest that you start with enumerate() rather than pairwise(). The former is a built-in, so more people care about it and your edit will get more attention, feedback, and buy-in.

Also, I would like to keep the latter as simple as possible for now. I'm evaluating a proposal to add prefix and suffix options to pairwise(). This would be much harder to do after these edits.

@eendebakpt
Copy link
Contributor Author

Can I suggest that you start with enumerate() rather than pairwise(). The former is a built-in, so more people care about it and

Sure! I created a PR for enumerate at #125734. For reversed there was an earlier PR #120971 where your review was requested. Would you have time to review that one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants